19 research outputs found

    Pairwise gene GO-based measures for biclustering of high-dimensional expression data

    Get PDF
    Background: Biclustering algorithms search for groups of genes that share the same behavior under a subset of samples in gene expression data. Nowadays, the biological knowledge available in public repositories can be used to drive these algorithms to find biclusters composed of groups of genes functionally coherent. On the other hand, a distance among genes can be defined according to their information stored in Gene Ontology (GO). Gene pairwise GO semantic similarity measures report a value for each pair of genes which establishes their functional similarity. A scatter search-based algorithm that optimizes a merit function that integrates GO information is studied in this paper. This merit function uses a term that addresses the information through a GO measure. Results: The effect of two possible different gene pairwise GO measures on the performance of the algorithm is analyzed. Firstly, three well known yeast datasets with approximately one thousand of genes are studied. Secondly, a group of human datasets related to clinical data of cancer is also explored by the algorithm. Most of these data are high-dimensional datasets composed of a huge number of genes. The resultant biclusters reveal groups of genes linked by a same functionality when the search procedure is driven by one of the proposed GO measures. Furthermore, a qualitative biological study of a group of biclusters show their relevance from a cancer disease perspective. Conclusions: It can be concluded that the integration of biological information improves the performance of the biclustering process. The two different GO measures studied show an improvement in the results obtained for the yeast dataset. However, if datasets are composed of a huge number of genes, only one of them really improves the algorithm performance. This second case constitutes a clear option to explore interesting datasets from a clinical point of view.Ministerio de Economía y Competitividad TIN2014-55894-C2-

    Biclustering sobre datos de expresión génica basado en búsqueda dispersa

    Get PDF
    Falta palabras claveLos datos de expresión génica, y su particular naturaleza e importancia, motivan no sólo el desarrollo de nuevas técnicas sino la formulación de nuevos problemas como el problema del biclustering. El biclustering es una técnica de aprendizaje no supervisado que agrupa tanto genes como condiciones. Este doble agrupamiento lo diferencia del clustering tradicional sobre este tipo de datos ya que éste sólo agrupa o bien genes o condiciones. La presente tesis presenta un nuevo algoritmo de biclustering que permite el estudio de distintos criterios de búsqueda. Dicho algoritmo utiliza esquema de búsqueda dispersa, o scatter search, que independiza el mecanismo de búsqueda del criterio empleado. Se han estudiado tres criterios de búsqueda diferentes que motivan las tres principales aportaciones de la tesis. En primer lugar se estudia la correlación lineal entre los genes, que se integra como parte de la función objetivo empleada por el algoritmo de biclustering. La correlación lineal permite encontrar biclusters con patrones de desplazamiento y escalado, lo que mejora propuestas anteriores. En segundo lugar, y motivado por el significado biológico de los patrones de activación-inhibición entre genes, se modifica la correlación lineal de manera que se contemplen estos patrones. Por último, se ha tenido en cuenta la información disponible sobre genes en repositorios públicos, como la ontología de genes GO, y se incorpora dicha información como parte del criterio de búsqueda. Se añade un término extra que refleja, por cada bicluster que se evalúe, la calidad de ese grupo de genes según su información almacenada en GO. Se estudian dos posibilidades para dicho término de integración de información biológica, se comparan entre sí y se comprueba que los resultados son mejores cuando se usa información biológica en el algoritmo de biclustering. Las tres aportaciones descritas, junto con una serie de pasos intermedios, han dado lugar a resultados publicados tanto en revistas como en conferencias nacionales e internacionales

    Databases Reduction Simultaneously by Ordered Projection

    Get PDF
    In this paper, a new algorithm Database Reduction Simulta neously by Ordered Projections (RESOP) is introduced. This algorithm reduces databases in two directions: editing examples and feature se lection simultaneously. Ordered projections techniques have been used to design RESOP taking advantage of symmetrical ideas for two dif ferent task. Experimental results have been made with UCI Repository databases and the performance for the latter application of classification techniques has been satisfactor

    Biclustering of Gene Expression Data Based on SimUI Semantic Similarity Measure

    Get PDF
    Biclustering is an unsupervised machine learning technique that simultaneously clusters genes and conditions in gene expression data. Gene Ontology (GO) is usually used in this context to validate the biological relevance of the results. However, although the integration of biological information from different sources is one of the research directions in Bioinformatics, GO is not used in biclustering as an input data. A scatter search-based algorithm that integrates GO information during the biclustering search process is presented in this paper. SimUI is a GO semantic similarity measure that defines a distance between two genes. The algorithm optimizes a fitness function that uses SimUI to integrate the biological information stored in GO. Experimental results analyze the effect of integration of the biological information through this measure. A SimUI fitness function configuration is experimentally studied in a scatter search-based biclustering algorithmMinisterio de Ciencia e Innovación TIN2011-28956-C02-02Ministerio de Ciencia e Innovación TIN2014-55894-C2-RJunta de Andalucía P12-TIC-1728Universidad Pablo de Olavide APPB81309

    A Measure for Data Set Editing by Ordered Projections

    Get PDF
    In this paper we study a measure, named weakness of an example, which allows us to establish the importance of an example to find representative patterns for the data set editing problem. Our ap proach consists in reducing the database size without losing information, using algorithm patterns by ordered projections. The idea is to relax the reduction factor with a new parameter, λ, removing all examples of the database whose weakness verify a condition over this λ. We study how to establish this new parameter. Our experiments have been carried out using all databases from UCI-Repository and they show that is possible a size reduction in complex databases without notoriously increase of the error rate

    Inferencia de Redes de Asociación de Genes Guiada por Similitud Semántica

    Get PDF
    En este trabajo se propone el uso de conocimiento a priori como heurística en métodos de inferencia de redes de genes a partir de datos de expresión obtenidos con tecnología de Microarray. Utilizamos Gene Ontology [15] como fuente de conocimiento a priori. Este repositorio se nutre de la información de anotaciones de relaciones en el material genético basadas en evidencias científicas. En este trabajo se propone el uso de medidas de similitud semántica, de manera más concreta la medida SimGIC en un método de inferencia basado en regresión. La propuesta se compara frente al mismo método sin integración de información y frente a otros métodos clásicos obteniendo mejoras y resultados comparables en otros casos

    Evolutionary Metaheuristic for Biclustering based on Linear Correlations among Genes

    Get PDF
    A new measure to evaluate the quality of a bicluster is proposed in this paper. This measure is based on correlations among genes. Moreover, a new evolutionary metaheuristic based on Scatter Search, which uses this measure as the fitness function, is presented to obtain biclusters that contain groups de highly-correlated genes. Later, an analysis of the correlation matrix of these biclusters is made to select these groups of genes that define new biclusters with shifting and scaling patterns. Experimental results from human B cell lymphoma are presented.Ministerio de Ciencia e Innovación TIN2007-68084-C02Junta de Andalucía P07-TIC-0261

    Biclustering of Gene Expression Data by Correlation-Based Scatter Search

    Get PDF
    BACKGROUND: The analysis of data generated by microarray technology is very useful to understand how the genetic information becomes functional gene products. Biclustering algorithms can determine a group of genes which are co-expressed under a set of experimental conditions. Recently, new biclustering methods based on metaheuristics have been proposed. Most of them use the Mean Squared Residue as merit function but interesting and relevant patterns from a biological point of view such as shifting and scaling patterns may not be detected using this measure. However, it is important to discover this type of patterns since commonly the genes can present a similar behavior although their expression levels vary in different ranges or magnitudes. METHODS: Scatter Search is an evolutionary technique that is based on the evolution of a small set of solutions which are chosen according to quality and diversity criteria. This paper presents a Scatter Search with the aim of finding biclusters from gene expression data. In this algorithm the proposed fitness function is based on the linear correlation among genes to detect shifting and scaling patterns from genes and an improvement method is included in order to select just positively correlated genes. RESULTS: The proposed algorithm has been tested with three real data sets such as Yeast Cell Cycle dataset, human B-cells lymphoma dataset and Yeast Stress dataset, finding a remarkable number of biclusters with shifting and scaling patterns. In addition, the performance of the proposed method and fitness function are compared to that of CC, OPSM, ISA, BiMax, xMotifs and Samba using Gene the Ontology Database

    Correlation–Based Scatter Search for Discovering Biclusters from Gene Expression Data

    Get PDF
    Scatter Search is an evolutionary method that combines ex isting solutions to create new offspring as the well–known genetic algo rithms. This paper presents a Scatter Search with the aim of finding biclusters from gene expression data. However, biclusters with certain patterns are more interesting from a biological point of view. Therefore, the proposed Scatter Search uses a measure based on linear correlations among genes to evaluate the quality of biclusters. As it is usual in Scatter Search methodology an improvement method is included which avoids to find biclusters with negatively correlated genes. Experimental results from yeast cell cycle and human B-cell lymphoma datasets are reported showing a remarkable performance of the proposed method and measureMinisterio de Ciencia y Tecnología TIN2007-68084-C00Junta de Andalucía P07-TIC-0261
    corecore